Simultaneous and exact interval estimates for the contrast of two groups based on an extremely high dimensional variable: application to mass spec data
نویسندگان
چکیده
MOTIVATION Analysis of high-throughput proteomic/genomic data, in particular, surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) data and microarray data, has led to a multitude of techniques aimed at identifying potential biomarkers. Most of the statistical techniques for comparing two groups are based on qualitative measures such as P-value. A quantitative way such as interval estimation for the contrasts of two groups is more appealing. RESULTS We have devised a simultaneous confidence bands method capable of detecting potential biomarkers, while controlling for overall confidence coverage level, in high-dimensional datasets that discriminate two treatment groups using a permutation scheme. For example, for the SELDI-TOF MS data, we deal with the entire spectrum simultaneously and construct (1 - alpha) confidence bands for the mean differences between groups. Furthermore, peaks were identified based on the maximal differences between the groups as determined by the confidence bands. The analysis method herein described gives both qualitative (P-value) and quantitative data (magnitude of difference). The Clinical Proteomics Programs Databank's ovarian cancer dataset and data from in-house samples containing known spiked-in proteins were analyzed. We were able to identify potential biomarkers similar to those described in previous analysis of the ovarian cancer data, however, while these markers are highly significant between cancer and normal groups, our analysis indicated the absolute difference between the two groups was minimal. In addition, we found additional markers than those previously described with greater differences in average intensities. The proposed confidence bands method successfully detected the spiked-in peaks, as well as, secondary peaks generated by adducts and double-charged species. We also illustrate our method utilizing paired gene expression data from a prostate cancer microarray experiment by constructing confidence bands for the fold changes between cancer and normal samples. AVAILABILITY R-package, 'seie.zip' (license: GNU GPL), is publiclly available at http://research2.dfci.harvard.edu/dfci/MS_spike-in_data/
منابع مشابه
Robust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملQuantification of identical and unique segments in ethylene-propylene copolymers using two dimensional liquid chromatography with infra-red detection
Hyphenating High Temperature High Performance Liquid Chromatography (HT-HPLC) with High Temperature Size Exclusion Chromatography (HT-SEC) (High Temperature Two Dimensional Liquid Chromatography (HT-HPLC x HT-SEC or HT 2D-LC)) leads to an isocratic elution in the second dimension, which in turn enables to use IR detector (quantitative detection) for monitoring the eluting polymers. Experimental...
متن کاملAn Algorithm based on Predicting the Interface in Phase Change Materials
Phase change materials are substances that absorb and release thermal energy during the process of melting and freezing. This characteristic makes phase change material (PCM) a favourite choice to integrate it in buildings. Stephan problem including melting and solidification in PMC materials is an practical problem in many engineering processes. The position of the moving boundary, its veloci...
متن کاملConfidence interval for the two-parameter exponentiated Gumbel distribution based on record values
In this paper, we study the estimation problems for the two-parameter exponentiated Gumbel distribution based on lower record values. An exact confidence interval and an exact joint confidence region for the parameters are constructed. A simulation study is conducted to study the performance of the proposed confidence interval and region. Finally, a numerical example with real data set is gi...
متن کاملThe Effect of Protein Kinase-B on FOXO Autophagy Family Proteins (FOXO1 and FOXO3a) Following High Intensity Interval Training in the Left Ventricle of the Heart of Diabetic Rats by Streptozotocin and Nicotinamide
Background: FOXO family proteins are important factors in autophagy pathway. Protein kinase-B is an important regulator for this family that can be regulated through exercise training. Therefore, the aim of this study is to investigate the effect of protein kinase-B (PKB) on FOXO autophagy family proteins (FOXO1 and FOXO3a) following high intensity interval training (HIIT) in the left ventricle...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 23 12 شماره
صفحات -
تاریخ انتشار 2007